Approximate Posterior Inference
Why need approximate posterior inference?
Approximate posterior inference methods are essential in Bayesian statistics because exact solutions to posterior distributions are often intractable.
Types of methods
Grid Approximation
- Grid approximation involves discretizing the parameter space into a grid of points and evaluating the posterior probability at each point.
- It is feasible for models with a small number of parameters and when the parameter space is not too large.
Steps:
- Define a grid of possible values for the parameter(s).
- Compute the posterior probability at each grid point.
- Normalize the probabilities to ensure they sum to one.
- Use these probabilities to approximate summaries (mean, variance, credible intervals) of the posterior distribution.
Laplace Approximation (Quadratic approximation)
- Laplace approximation approximates the posterior distribution with a Gaussian distribution centered at the mode (maximum a posteriori estimate) of the posterior.
Steps:
- Find the mode of the posterior distribution.
- Compute the Hessian matrix (second derivatives) at the mode to estimate the curvature of the log-posterior.
- Use the mode and the inverse of the Hessian to define a Gaussian approximation.
Variational Inference
- Variational inference (VI) approximates the posterior distribution by finding a simpler distribution that is closest to the true posterior, measured by Kullback-Leibler (KL) divergence (see Information theory and Entropy in Neuroscience#Kullback-Leibler Divergence)
Steps:
- Choose a family of distributions to approximate the posterior (e.g., Gaussian).
- Optimize the parameters of the chosen distribution to minimize the KL divergence to the true posterior.
Markov Chain Monte Carlo (MCMC)
- MCMC methods generate samples from the posterior distribution by constructing a Markov chain that has the desired posterior distribution as its equilibrium distribution.
- Common Algorithms
- Metropolis-Hastings Algorithm
- Proposes a new sample based on a proposal distribution and accepts or rejects it based on the posterior probability.
- Metropolis-Hastings Algorithm#Gibbs Sampling
- Samples each parameter in turn from its conditional distribution given the current values of the other parameters.
- Metropolis-Hastings Algorithm
Expectation Propagation
- Expectation Propagation is an iterative algorithm that approximates the posterior by matching moments (mean and variance) between the true posterior and a simpler distribution.
Steps:- Approximate each factor in the posterior (e.g., likelihood, prior) with a simpler distribution.
- Iteratively update these approximations to minimize the KL divergence to the true posterior in a way that balances the contributions of all factors.
Comparison between methods
approximation methods | pros | cons |
---|---|---|
Grid Approximation | simple and intuitive | computationally expensive for high-dimensional parameter spaces |
Laplace Approximation | - simple to implement - works well for unimodal, approximately Gaussian posteriors |
- poor approximation for non-Gaussian or multi-modal posteriors - requires second-order derivatives, which can be computationally expensive |
Variational Inference | - often faster than MCMC - scales well to large datasets |
- the quality of the approximation depends on the chosen family of distributions - may not capture all features of the true posterior, especially if the posterior is multi-modal or highly skewed |
MCMC | - can handle high-dimensional and complex posterior distributions |
- can be slow to converge and computationally intensive - requires careful tuning of algorithm parameters |
Expectation Propagation | - provides good approximations for certain types of models, especially in Bayesian machine learning - can handle complex posterior distributions better than some other methods |
- more complex to implement and understand - may not converge or provide good approximations in all cases |